A holistic view of stream partitioning costs
نویسندگان
چکیده
Stream processing has become the dominant processing model for monitoring and real-time analytics. Modern Parallel Stream Processing Engines (pSPEs) have made it feasible to increase the performance in both monitoring and analytical queries by parallelizing a query’s execution and distributing the load on multiple workers. A determining factor for the performance of a pSPE is the partitioning algorithm used to disseminate tuples to workers. Until now, partitioning methods in pSPEs have been similar to the ones used in parallel databases and only recently load-aware algorithms have been employed to improve the effectiveness of parallel execution. We identify and demonstrate the need to incorporate aggregation costs in the partitioning model when executing stateful operations in parallel, in order to minimize the overall latency and/or throughput. Towards this, we propose new stream partitioning algorithms, that consider both tuple imbalance and aggregation cost. We evaluate our proposed algorithms and show that they can achieve up to an order of magnitude better performance, compared to the current state of the art.
منابع مشابه
(Re)partitioning for stream-enabled computation
Partitioning an input graph over a set of workers is a complex operation. Objectives are twofold: split the work evenly, so that every worker gets an equal share, and minimize edge cut to achieve a good work locality (i.e. workers can work independently). Partitioning a graph accessible from memory is a notorious NP-complete problem. Motivated by the regain of interest for the stream processing...
متن کاملHolistic View in Medicine
Modern medicine was born in modernism period around two centuries ago with materialistic view in general and pure biologic approach to health and disease. This new kind of approach was based on the wide spread philosophic view of that era especially the hypotheses of Claude Bernard. As our understanding of human biology has tremendously progressed, and the emergence of postmodernism era has o...
متن کاملHolistic distributed stream clustering for smart grids
Smart grids consist of millions of automated electronic meters that will be installed in electricity distribution networks and connected to servers that will manage grid supervision, billing and customer services. World sustainability regarding energy management will definitely rely on such grids, so smart grids need also to be sustainable themselves. This sustainability depends on several rese...
متن کاملThe Sector–stream Matrix: Introducing a New Framework for the Analysis of Environmental Performance1
Environmental strategy is currently in transition from a reductionist view of individual technologies in isolation to a holistic and interdisciplinary view of the relationship between society, technology, and environmental impact. As a contribution to this larger effort, this paper uses a systems analytic approach to develop a ‘sector-stream matrix’ of functions and objectives that have an impa...
متن کاملCockpit Crew Pairing Problem in Airline Scheduling: Shortest Path with Resources Constraints Approach
Increasing competition in the air transport market has intensified active airlines’ efforts to keep their market share by attaching due importance to cost management aimed at reduced final prices. Crew costs are second only to fuel costs on the cost list of airline companies. So, this paper attempts to investigate the cockpit crew pairing problem. The set partitioning problem has been used for ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 10 شماره
صفحات -
تاریخ انتشار 2017